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Nonsmooth Formulation of the Support Vector 
Machine for a Neural Decoding Problem 
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■ Abstract 



This paper formulates a generalized classification algorithm with an applica- 
tion to classifying (or 'decoding') neural activity in the brain. Medical doctors 
and researchers have long been interested in how brain activity correlates to 
body movement. Experiments have been conducted on patients whom are un- 
able to move, in order to gain insight as to how thinking about movements 
' might generate discernable neural activity. Researchers are tasked with deter- 

mining which neurons are responsible for different imagined movements and 
how the firing behavior changes, given neural firing data. For instance, imag- 
ined movements may include wrist flexion, elbow extension, or closing the hand. 
This is just one of many applications to data classification. Though this article 
deals with an application in neuroscience, the generalized algorithm proposed 
ly^ I in this article has applications in scientific areas ranging from neuroscience to 

On ■ acoustic and medical imaging. 

o 



1 Introduction 

This paper deals with the problem of classifying neural activity that correlates to 
specific imagined movements given data recorded from an electrode array implanted 
in the primary motor cortex of the human brain. We are motivated by the training 
methods and neural decoding algorithms that have been developed by the Battelle 
Memorial Institute for the Braingate project as in [1]. The goal of the technology is 
to isolate and predict arm/hand movements a given patient is thinking about from 
signals processed by an electrode array. For this particular problem, we are trying to 
identify the active neurons, yet the data largely consists of measurements correspond- 
ing to the neurons at "rest". In order to identify the neurons, we must first classify 
the data by movement type. We have 96 neural channels due to the construction of 



*Naval Surface Warfare Center, Panama City Beach, FL and Department of Mathematics, North 
Carolina State University, Raleigh, NC 

^Center for Research in Scientific Computation, Department of Mathematics, North Carolina 
State University, Raleigh, NC 

■l-Battelle Memorial Institute, Columbus, OH 



1 



the Utah electrode array being utihzed, and the training data contains neural firing 
reactions to a person simply thinking about the movements. The overall goal of this 
neural network study is to identify the active neurons that are most responsible for 
each movement. Our objective is to have a simple linear classifier that incorporates 
sparsity in the formulation. Sparsity optimization is becoming increasingly utilized 
due to storage and implementation considerations (see for instance IHHI]). Not only 
that, but simple solutions are more robust to noise than exact solutions. By sparsity, 
we mean a solution with the fewest number of nonzeros necessary to capture the core 
properties of the solution. Mathematically, sparsity is represented by Ip norms for 
< p < 1. To this end, we obtain a simple linear classifier that, though simple, 
has desirable characteristics for classifying data. In general, classification is a highly 
useful but difficult problem. For certain applications, such as the one considered in 
this paper, the difficulties can be compounded when the data is dynamic in time. We 
address some necessary improvements to widely used classifiers that only minimally 
increase computational effort. 

Classifying the movements from the neural data can be formulated as a nonsmooth 
constrained minimization problem which leads to a, possibly ill-posed, inverse prob- 
lem. There have been several methods proposed for classification problems of this 
type. In this paper, we propose and implement several improvements to such classifi- 
cation methods. We propose a new formulation using sparsity for the weights of the 
classifier which yields a sharper classification due to the noble use of the nonsmooth 
performance index (see [SI [?]). Our formulation is based on the method proposed in 
[2] and the improved least squares formulation developed in [5]. Utilizing the £p-norm 
for < p < 1 enables us to reduce the number of classified data and, hence, reduce 
the misclassification of erroneous data. In terms of the application considered in this 
paper, this approach improves the sparsity of the solutions in terms of the identi- 
fied neurons for each movement. For this application, the Proximal Support Vector 
Machine(PSVM) was previously used due to the fact that it offers a closed form so- 
lution and has a computationally efficient implementation. It will be demonstrated 
that our classification method attains solution sparsity, improves separability of the 
classes and is able to account for bias in the data. In order to improve the separa- 
bility of the classes, we employ the penalty formulation of the inequality constraint 
in the linear Support Vector Machine(SVM). Due to the large amount of data for 
periods of rest, the standard classification algorithm could have a bias towards iden- 
tifying the "rest" state, hence reducing the identification of neurons corresponding 
to an imagined movement. To counteract this effect, we weight the error function 
accordingly. Moreover, we propose a new measure of performance for identifying the 
responsible neurons once the approximate solution has been computed. The power 
of the methods presented in this paper is due to the fact that not only is the data 
classified, but the strength of the classification in terms of the separation is obtained 
from the same method. That is, without extra work one obtains a measure of how 
the classifier performs along with the classifier. We are motivated by the particular 
Braingate application, however the formulation developed here has the potential for 
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improving classification for a wide range of applications. 

The paper is organized as follows. In Section |2l we briefly discuss the SVM 
and PSVM, pointing out in what way the methods can be improved. Section [3] 
contains our approach for addressing the necessary improvements, which is largely 
based on sparsity optimization. In Section H] we present a choice rule for selecting 
the regularization parameter, which can be used for any of the methods discussed in 
this paper. Finally, in Section [5l we present results from applying the PSVM and the 
sparse approach to real world neural firing rate data. 



2 The Support Vector Machine 

In this section, we give a basic outline of the Support Vector Machine (SVM) algo- 
rithms. We are given training data a set of n points of the form 

V = di) I X, G d, G {-1, l}}t, 

where the di is either 1 or -1. We want to find the maximum- margin hyperplane that 
divides the points having di = 1 from those having di = —1. Any hyperplane can be 
written as the set of points x satisfying 

Xi ■ w — 'J = 0. 

To this end the linear SVM determines the hyperplane (10,7)"'' by the constrained 
minimization; 

subject to di{xi ■ w — 7) > 1 — Hi > 

where yi measures the degree of misclassification and > is a chosen parameter. 
That is, the SVM algorithm classifies data into two categories, Q~ and geomet- 
rically separated by the plane {x : x ■ w = j}, and clustered around the two planes 

= {x G M™ : X ■ w - 7 < -1} 

(2.2) 

fi+ = {x G M"" : X • w - 7 > +1}. 

The classes and data V for this particular application will be discussed in Section [5] 
to follow. 

The authors of |5] formulate the inequality y > in terms of a penalty, 

min Trbr + 7t(|^|^ + 7^) subject to -D(Aty — 7e) + y > e, (2.3) 
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where D = diag{di) and A G M"^™ with rows Ai = Xi, 1 < i < n. Furthermore, 
the PSVM algorithm in |5] replaces the inequality as the equality constraint and 
formulate the unconstrained minimization 

min 2 1^1^ 2^^^^^^ ~'~ subject to D(v4w — 7e) + y = e. (2.4) 
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Our formulation is also an unconstrained minimization of the form 

min — + -(I'U^P + 7^) subject to /^(Aw — 7e) + y = e, (2.5) 

where = max(0,yj). Our motivation for choosing in this manner can be 
understood by the following simple argument. Note that if ?/i > in (12.41) then 

di{xi ■ w - -f) = 1 - Vi 

and thus yi is the degree of misclassification. However, if ?/j < 

di{xi ■w--f) = l-yi>l 

and thus the case is allowed. We are motivated by this fact to only penalize yl' = 
max(0,?/j) in the formulation (12. 5p . In this sense the formulation (12.51) is penalizing 
the inequality constraint of (12. ip . It should improve the separability of the classes 
based on the least squares formulation (12. 4p . 

But, the advantage of (12. 4p is that it has the closed form solution {w 7)""". That 
is, (12.41) is equivalent to 

^\Hiw 7)T-ep + l(|i^|2 + f) 

where H = D[A — e] (i.e. y = e — H{w 7)""") and thus 

u = (u) 7)T = (/ + z/ WH)-^We. 

However, it will be shown in Section |3] that the formulation (12. 5 p has an efficient 
implementation as well. 

An important consideration for classification problems of this form is the possibil- 
ity of ill-posedness. If H is very ill-conditioned, i.e. the singular values of H decrease 
very rapidly to zero, then the solution is very sensitive to the selection of > 0. 
Thus, we need to develop a selection rule for i/, which will be addresed in Section |4] in 
a more general set-up. The second term |(|if P + 7^) in (12. 4p represents the 2-norm 
of u = (ly 7)""". It is more reasonable to use some other norms to obtain a desirable 
classifier. One of our requirements is that fewer nonzero components of w are in the 
final solution. To satisfy this requirement, we use the P' norm with < p < 1 for our 
formulation in Section |3] to obtain the sparse solution. The nonzero components of w 
represent the essential and critical neurons for classifying the specified movement. In 
this way we can obtain the neural network information of the Braingate technology. 
This point will be examined in the numerical tests presented in Section [51 



3 New Approach 

Our general classifier can be written as the optimization problem 



mill 0(?/) + /??/^(u),7), 

(3.1) 

y = e — Hu 
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where we assume (^,■0 are lower semi-continuous, as in [6], [7]. Typically 4> and ■0 
are chosen to be some norm on (in general, a Banach space X). For instance, 

in the PSVM we have (j){y) = \y\l and tp{w,'^) = -{\w\l + |7p), for X = M", with 

/3 = -. As stated in the previous section, the choice of the 2-norm is often chosen for 
ease of computation and to guarantee the closed form solution, however, statistically 
we should consider other norms. The choice of and i/) depends on the desired 
properties of the solution, i.e., respectively, the choices of and i/j represent the 
class the solution {w, should belong to and the noise statistic, which may not be 
known. Recall that we wish to address three aspects of the classification algorithms 
typically used. Each aspect will be considered separately, however the final algorithm 
incorporates all elements. 

3.1 Improving sparsity via £p minimization 

The first aspect considered is the sparsity of the weights w corresponding to the 
neurons. The weights w largely consist of insignificant coefficients, which one may 
desire to "weed" out, or effectively remove. We introduce sparsity by developing the 
p-norm method which weeds out unnecessary weights and selects the responsible data 
for each class. For the p-norm method we choose 

^ m 

?/^(w;,7) = |«;|^ + -I7I2 where \w\'^ = ^\wi\P (3.2) 

i=l 

for < p < 1. The p-norm minimization has the effect of selecting the desired data. 
In fact, it can be shown that 

l^lp ^ ^i'^ of nonzero elements of as p — t- 0+. 

Thus, i/j enhances sparsity in the solution w as p ^ 0^. In other words, the choice 
of the p-norm with p < 1 removes weights corresponding to neurons which are not 
active with respect to the established baseline. 

One can develop the necessary optimality condition for (13. ip with (13. 2p despite 
the fact that |tfj|p is not differentiable at Wi = 0. Specifically, let us consider the case 
when p = 1. For p = 1 we have 

o\w\ = - — - 
\w\ 

at w 0, however for w = we have the subdifferential 

d\w\ = 1-1,1] 

where the subdifferential of a functional f : X ^ {—00, 00] at x G is defined as the 
set 

{x* e X*\f{z) > f{x) + {x\z-x),'iz E X}, (3.3) 
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which can be found in [HI [TO] for example. However, for p < 1 we have d\w\^ 
when z/; = 0. To remedy this, for e ^ 1, we take the approximation 



p W 



d\w 



which approximates the formal derivative 



d\w\l 



w 



\w 



\2-p 



(3.4) 



(3.5) 



for any value of w. One can see a depiction of the approximate derivative dslwl in 
Figure [3?T] with e = lOe"^ and e = lOe"^. 



- approximate derivative fort-10e-3 

- approximate (derivative forE=10e-2 




-0.4 -0.3 -0.2 -0.1 0.1 0.2 0.3 0.4 0.5 



Figure 3.1: Comparison of approximate sub differentials for e = lOe ^ and e = lOe 



Based on this, consider the iterative algorithm of the form 



/3p 



max(£2~P, liy'^p-p 



-w 



k+l 



H*e 



(3.6) 



to find the minimizer of (13.10 with (13. 2p . It can be shown that this sequence converges 
to the minimizer of the appropriate cost functional, which is discussed in Appendix 

m 



3.2 Improving separability via inequality constraint 

We now present an algorithm which indirectly utilizes the inequality D{Aw — e^)-\-y > 
e for the optimization of {w,'jy. For this approach, we consider the constrained 
minimization 



subject to D{Aw — 67) + y = e 



mm 

(«),7) 



(3.7) 
(3.8) 
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where we design (j) to incorporate the weighting for regions where {Aw — ej) + y < e 
and {Aw — ej) + y > e. In this case, the functional is taken to be 

(l){y) = ^mm{0,Hu- ef. (3.9) 

Here, we are attempting to create more of a division between the two classes of data, 
so that the data is classified more distinctly. The choice (13. 9p works well since if y > 
we will have 

{x ■ w - -f) = 1 - yi 

which means that yi is the degree of misclassification, however, if yi < we have 

{xi ■ w - -f) = 1 - yi > 1 
which is desirable since the data is pushed farther away from 1. 

3.3 Weights for reducing bias 

Due to the large amount of data for periods of rest, the standard classification al- 
gorithm could have a bias towards identifying the "rest" state, hence reducing the 
identification of an imagined movement. The rest state corresponds to data for which 
di = —1. Hence, it is reasonable to consider incorporating different weights for the 
two cases di = —1 and di = 1. To reduce the bias towards identifying coefficients for 
which di = —1, we choose a parameter a < 1 for the weight corresponding to this 
data. In that way, for di = —1, we weight the data by selecting a parameter a and 
we define the norm 

m = l\\s'/'iHu-e)r 

where the matrix S is defined by 

( Sii = l ii di = 1 
\ Sii = a if di = -1 

in order to account for the bias. 

3.4 Algorithm 

We now provide the details of the numerical implementation of the methods discussed 
in Section [3] which contains all the desired properties discussed. For all algorithms 
presented in this section, the Tikhonov regularization parameter is /3 and we define 
the matrix 

H = D[A -e]. (3.10) 

We develop an iterative algorithm which incorporates all aspects given in this 
section, based on the iterative method (13.61) . In summary, the iterative method for 
computing the classifier (w, 7)""" is given by 
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(3.11) 



rf, = max(0, 1-4(2:^-7)), ^ 



for some small e > 0. Each step involves forming the diagonal matrices F, T and solv- 
ing the linear equation for {w^^^ , ^^^^Y . For the application considered in this paper, 
convergence is achieved with a relatively low number of iterations(3 or 4 is reason- 
able), so that the complexity of the proposed algorithm nearly equals the complexity 
of the PSVM. Thus, the advantages of this approach are realized with only a small 
increase in computational cost for our application. Note that (13. lip is equivalent to 
the PSVM formulation if we set 5, T, F = J G M"^"^. 



4 Choice rules for the regularization parameter 

One of the questions that arises is how to choose the parameter = ^ optimally, so 
that the best solution is obtained. Naturally, the choice of /3 will depend on the choices 
of the functionals (j) and tp. We present here not only the well known choice rule due 
to Morozov, but also a choice rule developed in |[6| f7] which has fewer assumptions 
than the well known Morozov's discrepancy principle. 

If one knows the noise level in the data, then a well known and useful choice rule 
is the Morozov's Discrepancy Principle. This choice rule works quite well in the case 
of known noise levels. For the optimal solution up, the Morozov discrepancy principle 
seeks /3 > such that 

(p{uf^,y) ~ S 

where 6 is the noise level defined by 

\y - y^\y < 5 

for exact data, y, and noisy data, y^. Here, 6 can be thought of as the performance 
level of the optimal solution. 

It should be noted, that the approximate solutions — )■ m in the case that 5 — )■ 0. 
Though the approximations converge, for problems such as classification, other choice 
rules may provide better performance. This has led to extensive research for the goal 
of developing choice rules which are able to automatically tune the parameter, for 
the particular problem. 

We now present a choice rule for automatically selecting the parameter 13 based 
on the choices of 0, ^Z'. Consider maximizing the conditional density p{{u,T, X)\y) ~ 
p{y\{u,T, X))p{u,T, X) where (r, A) are density functions for 0, "0, respectively, both 
having Gamma distribution. The balancing principle is derived from the Bayesian 
inference [H] 

min T(f){u, y) + Xi/j^u) + PqX — In A + j3iT — cxi In r. 

{n,r,A) 
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Letting /3 = ^, the necessary optimality is given by 

Ujs = arg mm{(/){u, y) + 

u 

X= ^° , 
ai 

T = — 

(f){u,3) + j3i 

or 

Up = arg min{^(u, y) + I3ip{u)} (4.1) 



/3 = l^M±i^, (4.2) 

^?/^(M;3) "0- 

A variational formulation of the balance principle fl4.2p is developed in [HI E]. The 
theoretical justification for the balance principle follows from the variational formula- 
tion. Good choice rules for regularization parameters become increasingly important 
as the data becomes increasingly noisy and, likewise, when the problem is highly 
ill-posed. In the case that the statistic n is unknown, an algorithm for estimating 
is discussed in [6l [7] . 

The natural choice for updating the parameter /3 would be the fixed point iteration 

1. Set k = and choose (3o 

2. Solve for Uk 

arg m.m{(j){u) + f^ki^i^u)} 

u 

3. Update the regularization parameter j3k+i by 

1 (l){Uk) 



fc+i 



4. Check the stopping criterion. If convergence is not met, set k = k + 1 and 
repeat from Step [2l 



5 Numerical Results 

In this section, we provide numerical results from the application of our approach to 
the neural classification problem outlined in the Introduction. We first provide an 
outline of how the experiment is conducted, as it will be important for deciphering 
the results provided. The experiment proceeds by asking a person think about a 
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specific movement such as wrist flexion, elbow extension, or closing the hand. The 
data obtained is neural firing rate data, meaning that this is a time dynamical data 
classification problem. It is desired to determine the correlation between imagining 
a movement and the neural response to imagining the movement. The experiment 
consists of periods of rest and periods where the person is cued to think about a certain 
movement. For this particular experiment, there are five wrist movements consisting 
of wrist extension, wrist flexion, wrist radial deviation, wrist ulnar deviation, and 
closing hand. The patient is cued to imagine one of the movements consecutively, 
with periods of rest between each cue. The person is then cued for another of the 
five movements consecutively, again with periods of rest in between each cue. For 
instance, an example set of cues for the movement of wrist up can be seen in Figure 
15.11 The period of time after the cues for wrist up consists of both rest and cues for 
other movements. This is the nature of how the experiment is conducted. Data is 
collected for a specified interval of time, which consists of periods of rest and periods 
of cues for each movement. The goal of the data classification is to sharply separate 
the data for each movement. For example, we must separate the data corresponding 
to wrist "up" (extension) from the data corresponding to both the rest periods and the 
periods for other movements. Thus, the results given here are only for one particular 
movement, however the method produces similar results for the other five movements. 



- wrist down 

- wrist left 

- wrist right 

- hand closed 



Figure 5.1: Cues for the first patient. 

We also develop and implement a new performance measure utilizing the force 

F = Aw--f. (5.1) 

Often, the performance measure is taken to be 

( 1 ifF>0 
sgn(F) = < if F = (5.2) 
( -1 ifF<0 

pointwise in time, which could have a tendency to include outliers. Instead, we 
consider other options for determining the performance. One such option is to sum 
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the force over a time interval. That is we define a performance measure P by 



k+n 

E 



for some n (e.g. 



n 



5). One could also take the pointwise average as 



2n+l ' 



which 



is equivalent to P as a performance measure since we are interested in the sign of 
that data. We call P the summed performance measure. Another option would be to 
average the force over each time interval of cues and rest periods. This performance 
measure is given by 



Pk 



1 ^ 



(5,3) 



where [ti . . .t^) is the interval of time for the cue or rest period, and is the number 
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(b) 



20 25 30 35 



(c) 



(d) 



Figure 5.2: (a) An example of the force F; (b) An example of the averaged force; (c) 



The sign function of the original force; (d) The sign function of the averaged force. 



of points. Note that we are taking the average over each interval for which the sign of 
the surrounding data is the same. Both P and P improve the detection since outliers 
are likely single points or small clusters. The process of summing the forces can cause 
outliers to cancel out over the interval. Likewise, the process of averaging over an 
interval decreases their effect. For example, one can see in Figure |5T2] how sign(F) 
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(a) 



(b) 



Figure 5.3: (a) Summed forced example (b) Averaged force example. 



has an outlier while sign(P) does not. Analyzing the force yields the detection of 
the responsible neurons for a given movement, and hence the performance of the 
algorithm. 

The two performance measures P and P are both likely good choices, however P 
may distinguish more clearly between the data classes, as can be seen in Figure 15.31 



5.1 Stroke Patient 

We now provide the numerical results for one of the patients who was left paralyzed 
after a stroke. As far as the two data sets used for this paper, the stroke patient's 
data is the easier of the two sets. We applied the modifications to the original PSVM 
separately and some in conjunction with one another. We provide comparisons of the 
force itself, the performance measure and we also include a heat map of the neural 
activity, which can be used for visualizing the active neurons. The cues for the action 
of 'wrist up' can be seen in Figure [57il for this particular patient. 

lOOi I I I I I |i I 1 1 1 1 1 1 
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Figure 5.4: Cues for the wrist up action. 
For clarity of the presentation, we separately consider the three improvements of 
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our formulation, namely the increased sparsity, reduced bias towards classifying the 
at rest data and the robustness of separating the classes. We provide results for each 
of the three separately, however the best implementation takes advantage of all three 
aspects. It should also be noted that all results for this patient are for the wrist up 
movement, unless otherwise stated. 

By taking the Ip minimization with p = .2 versus the £2 minimization we are 
able to increase the number of coefficients such that \wi\ < le — 4 from 1 to 18. The 
increase in sparsity is depicted in Figure 1531 where one can see how taking successively 
smaller values of p reduces the number of nonzero weights, w. We now illustrate how 




"0 10 20 30 40 50 60 70 80 



Figure 5.5: Comparison of weights for different £p norms. 

changing j3 can improve the force. By taking j3 = .2 one can see in Figure [521 that the 
change in force by taking the (.p minimization versus the (.2 minimization is negligible, 
however an increase in sparsity is still realized. 
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(a) 



(b) 



Figure 5.6: (a) Comparison of ^2 versus tp,p = .2 using summed force; (b) Comparison 



of £2 versus ip,p = .2 using averaged force. 



Now, let us consider the incorporation of a parameter a in order to reduce the 
bias towards classifying at rest data(or data for other movements). If the value of a 
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is chosen appropriately, the magnitude of the force increases where F > 0, while the 
increase in magnitude is minimal for F < 0. This is illustrated in Figure [5171 




We now present results for improving the robustness of separating the classes, via 
the choice fl3.9p . In this case, an increase in magnitude for F > and a decrease 
in magnitude for F < is realized. This is ideal, since it is desired to completely 
separate the classes. Note that the forces have been normalized to present a fair 
comparison. 




-PSVM 

- Formulation using fl)=.5 max(0,y) 
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(a) 



(b) 



Figure 5.8: (a) Summed force after minimization of (j){y'^) (normalized) for a = l,p 



1; (b) Averaged force after minimization of (j){y~^) (normalized) for a = l,p = 1. 



Next, we display a comparison of the summed force, P, for the PSVM formulation 
versus the nonsmooth formulation developed in this paper, in Figure 15.91 Note that 
the PSVM is the same as ([23]) with S,T,T = I e M'^^'^. As can be seen, the 
separation of the considered movement from all other movements is improved. 

As a test of the classifier, we train on the first three cues for a specified movement 
and test the classifier on the fourth cue. The test results can be seen in Figure 15.101 
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Figure 5.9: Comparison of PSVM and nonsmooth formulation for a = .5,p = .2. 




Nonsmooth Formulalion 

PSVM 

"0 5 10 15 20 25 30 35 40 45 50 




(a) 



(b) 



Figure 5.10: (a) Test of nonsmooth formulation for wrist down for a = .l,p = 2; (b) 



Test of nonsmooth formulation for wrist right for a = .l,p = 2. 

for two movements. As one can see, the classification is nearly equal for both the 
PSVM and our nonsmooth formulation. 



5.2 Lou Gehrig's Disease Patient 

We now provide results for a patient affected by Amyotrophic Lateral Sclerosis(ALS), 
or Lou Gehrig's Disease. The majority of the results are similar to those obtained 
for the stroke patient, with the exception of one outlier in the results. Several ex- 
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planations should be considered as to why the outher appears in the results. One 
potential explanation is that the patient suffered from performance degradation due 
to the increased length of this particular experiment. Secondly, the nature of this 
patients physical disability is different from that of the first patient, so there may 
be difficulty with respect to the natur of the disease itself. Even with this outlier, 
the results are improved, considering that the outlier is a single spike in the results 
and can likely be ignored when analyzing the data. This particular data set consists 
of two periods of cues, as opposed to the one period for the first data set. Being 
the more difficult of the two data sets, this truly tests the power of our nonsmooth 
formulation. 
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Figure 5.11: Cues for the wrist up action. 



One can see similar results for this patient in Figures 15.121 and 15.131 where one 
can see the circled outlier in Figure I5.13bl This particular example shows how the 
tuning of the parameter a can affect the classification results. If a is chosen to be too 
small (e.g. a = .1 for this data set) then the appearance of outliers may increase. 
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(a) 



(b) 



Figure 5.12: (a) Comparison of £2 versus £p using summed force for p = .2; (b) 



Comparison of £2 versus £p using averaged force for p = .2. 
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As a test of the classifier, we train on the first set of cues for a specified movement 
and test the classifier on the second set of cues. The test results can be seen in 
Figure 15.141 for one movement. As one can see, the classification improves by using 
the nonsmooth formulation versus the PSVM. 




(a) 



(b) 



Figure 5.14: (a) Training corresponding to test of nonsmooth formulation for a 
.l,p = 2; (b) Test of nonsmooth formulation for wrist up for a = .l,p = 2. 



5.3 Identifying the Physical Location via Heat map 

Once the data has been classified, one can determine the responsible neurons for a 
given movement, based on the classification. To accomplish this, we form a heat map 
of the weights, corresponding to each neuron. Each unit on the implanted patch 
consists of up to four neurons. For this particular problem, we are given a 10 x 10 
matrix, M, representing the physical location of the neurons with respect to the 
implanted patch. Using this matrix M we map the weights, to the corresponding 
neuron's location on the patch. A depiction of the heat map for several cases can 
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be seen in Figures I5.15[ I5.16[ and 15.171 As can be seen in Figure I5.15[ utilizing the 
ip norm for p <^ 1 allows one to identify the most responsible neural regions for a 
specific movement. As p — )■ the number of identified neural regions is reduced. For 
different wrist movements, the same region is identified, however when comparing 
shoulder versus wrist movements different regions are identified with high activity, as 
can be seen in Figures 15.161 and I5.17[ respectively. 



(a) 



I 



I 



(c) 



4^ 



I 



(b) 



I 



Figure 5.15: (a) Heat map using £1/2 norm; (b) Heat map using £1/5 norm; (c) Heat 
map using £.0002 norm 
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I 

I 




I 



I 



(a) 



(b) 



I 



I 



(c) 



Figure 5.16: (a) Heat map using ^.0002 norm for wrist up; (b) Heat map using £.0002 
norm for wrist down; (c) Heat map using £.0002 norm for wrist right 



I 



(a) 



(b) 



^ .^^^^ ^.^ , . 1^ Heat map for wrist up versus for a 
shoulder up for a = 1, jo = 1. 



Figure 5.17: 



1,1? = 1; (b) Heat map for 
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6 Conclusion 



To summarize, we have provided a very generalized and easy to implement classi- 
fication algorithm, based on the nonsmooth Tikhonov regularization and the least 
squares formulation in [5]. The method is robust and works well even if the problem 
is severely ill-posed. The algorithm not only provides the classifier, but is capable 
of detection using the force as discussed in the results section. By utilizing sparsity, 
one can minimize storage while determing the responsible neurons for this particular 
application. Future research includes more theoretical analysis of the methods pro- 
posed, as well as an extension to nonlinear classification problems via reproducing 
kernel Hilbert space techniques. 
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A Convergence of the Iterative Algorithm 

Multiplying (13. 6 p by to^"*"^ — w^, we obtain 

+ , , f I I - + - + (e, w'-^' - w% 



Define the function 



for X > and 



Then, 



£ \ 2 

to 2 X > £ 



max(52-p, |i(;fc|2-P) 2 
Since x — > ^e(x) is concave, we have 

max(£^ P, liy'^p P) 2 

and thus 

^ ^ 2^ ^ " max(£2-p, |wfc|2-P) 2 ' i _ ev ; 

(A.2) 

shows that is nonincreasing. Now, we give the following result. 

Theorem A.l. Fore > let {wk} be generated by (13.61) . Then, ,Jir{w^) is mono- 
tonically non increasing and Wk converges to the minimizer of defined by (\A.l\i . 



Proof. The monotonicity of has already been shown. Thus, we show that {w''} 
converges to the minimizer of J^. It follows from (I A. 2 1) that k'^loo < oo and 



E 

k=0 



and thus there exists subsequence of {w'^} and w* G such that 



lim Wk = lim w^"*"^ 



tf . 
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It follows from fl3.6 



H*Hw* + -^r^, — w* = H*e 



i.e., vj* minimizes J^. □ 



23 



